Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 15326 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.5 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 4 |
city is highly correlated with city_development_index | High correlation |
city_development_index is highly correlated with city | High correlation |
relevent_experience is highly correlated with last_new_job | High correlation |
last_new_job is highly correlated with relevent_experience | High correlation |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
major_discipline has 202 (1.3%) zeros | Zeros |
experience has 445 (2.9%) zeros | Zeros |
company_size has 1687 (11.0%) zeros | Zeros |
company_type has 519 (3.4%) zeros | Zeros |
last_new_job has 6519 (42.5%) zeros | Zeros |
Reproduction
| Analysis started | 2022-02-20 15:11:22.200691 |
|---|---|
| Analysis finished | 2022-02-20 15:11:42.159983 |
| Duration | 19.96 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 15326 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9561.503719 |
| Minimum | 0 |
|---|---|
| Maximum | 19157 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 945.25 |
| Q1 | 4751.5 |
| median | 9567.5 |
| Q3 | 14354.75 |
| 95-th percentile | 18194.75 |
| Maximum | 19157 |
| Range | 19157 |
| Interquartile range (IQR) | 9603.25 |
Descriptive statistics
| Standard deviation | 5540.3585 |
|---|---|
| Coefficient of variation (CV) | 0.5794442655 |
| Kurtosis | -1.204649403 |
| Mean | 9561.503719 |
| Median Absolute Deviation (MAD) | 4801 |
| Skewness | 0.001740770794 |
| Sum | 146539606 |
| Variance | 30695572.3 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17855 | 1 | < 0.1% |
| 8367 | 1 | < 0.1% |
| 3521 | 1 | < 0.1% |
| 4482 | 1 | < 0.1% |
| 14442 | 1 | < 0.1% |
| 2681 | 1 | < 0.1% |
| 16789 | 1 | < 0.1% |
| 4463 | 1 | < 0.1% |
| 4142 | 1 | < 0.1% |
| 2231 | 1 | < 0.1% |
| Other values (15316) | 15316 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 12 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 19157 | 1 | |
| 19156 | 1 | |
| 19154 | 1 | |
| 19153 | 1 | |
| 19152 | 1 | |
| 19151 | 1 | |
| 19150 | 1 | |
| 19148 | 1 | |
| 19147 | 1 | |
| 19146 | 1 |
| Distinct | 122 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44.20161817 |
| Minimum | 0 |
|---|---|
| Maximum | 122 |
| Zeros | 18 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 5 |
| median | 48 |
| Q3 | 64 |
| 95-th percentile | 104 |
| Maximum | 122 |
| Range | 122 |
| Interquartile range (IQR) | 59 |
Descriptive statistics
| Standard deviation | 35.51183338 |
|---|---|
| Coefficient of variation (CV) | 0.8034057316 |
| Kurtosis | -1.015955959 |
| Mean | 44.20161817 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | 0.4051720284 |
| Sum | 677434 |
| Variance | 1261.09031 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5 | 3493 | |
| 64 | 2128 | |
| 48 | 1248 | 8.1% |
| 13 | 1062 | 6.9% |
| 49 | 679 | 4.4% |
| 30 | 459 | 3.0% |
| 95 | 347 | 2.3% |
| 103 | 251 | 1.6% |
| 6 | 243 | 1.6% |
| 4 | 243 | 1.6% |
| Other values (112) | 5173 |
| Value | Count | Frequency (%) |
| 0 | 18 | 0.1% |
| 1 | 69 | 0.5% |
| 2 | 220 | 1.4% |
| 3 | 56 | 0.4% |
| 4 | 243 | 1.6% |
| 5 | 3493 | |
| 6 | 243 | 1.6% |
| 7 | 65 | 0.4% |
| 8 | 6 | < 0.1% |
| 9 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 122 | 79 | |
| 121 | 70 | |
| 120 | 88 | |
| 119 | 22 | 0.1% |
| 118 | 19 | 0.1% |
| 117 | 36 | 0.2% |
| 116 | 153 | |
| 115 | 17 | 0.1% |
| 114 | 56 | 0.4% |
| 113 | 20 | 0.1% |
| Distinct | 93 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 63.10622472 |
| Minimum | 0 |
|---|---|
| Maximum | 92 |
| Zeros | 15 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 37 |
| median | 82 |
| Q3 | 85 |
| 95-th percentile | 90 |
| Maximum | 92 |
| Range | 92 |
| Interquartile range (IQR) | 48 |
Descriptive statistics
| Standard deviation | 29.14003885 |
|---|---|
| Coefficient of variation (CV) | 0.4617617197 |
| Kurtosis | -0.9975403599 |
| Mean | 63.10622472 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -0.825825199 |
| Sum | 967166 |
| Variance | 849.141864 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 85 | 4172 | |
| 14 | 2128 | |
| 82 | 1248 | 8.1% |
| 90 | 1062 | 6.9% |
| 27 | 546 | 3.6% |
| 78 | 459 | 3.0% |
| 91 | 411 | 2.7% |
| 67 | 347 | 2.3% |
| 88 | 243 | 1.6% |
| 57 | 243 | 1.6% |
| Other values (83) | 4467 |
| Value | Count | Frequency (%) |
| 0 | 15 | 0.1% |
| 1 | 24 | 0.2% |
| 2 | 5 | < 0.1% |
| 3 | 11 | 0.1% |
| 4 | 4 | < 0.1% |
| 5 | 11 | 0.1% |
| 6 | 5 | < 0.1% |
| 7 | 75 | 0.5% |
| 8 | 195 | |
| 9 | 47 | 0.3% |
| Value | Count | Frequency (%) |
| 92 | 70 | 0.5% |
| 91 | 411 | 2.7% |
| 90 | 1062 | 6.9% |
| 89 | 144 | 0.9% |
| 88 | 243 | 1.6% |
| 87 | 109 | 0.7% |
| 86 | 9 | 0.1% |
| 85 | 4172 | |
| 84 | 79 | 0.5% |
| 83 | 155 | 1.0% |
gender
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 898.1 KiB |
| 1.0 | |
|---|---|
| 0.0 | 1235 |
| 2.0 | 206 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 13885 | |
| 0.0 | 1235 | 8.1% |
| 2.0 | 206 | 1.3% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1.0 | 13885 | |
| 0.0 | 1235 | 8.1% |
| 2.0 | 206 | 1.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 868.2 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 11038 | |
| 1 | 4288 | 28.0% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 11038 | |
| 1 | 4288 | 28.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
enrolled_university
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 898.1 KiB |
| 2.0 | |
|---|---|
| 0.0 | |
| 1.0 | 982 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 2.0 |
| 3rd row | 2.0 |
| 4th row | 2.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 2.0 | 11264 | |
| 0.0 | 3080 | 20.1% |
| 1.0 | 982 | 6.4% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 2.0 | 11264 | |
| 0.0 | 3080 | 20.1% |
| 1.0 | 982 | 6.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
education_level
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 898.1 KiB |
| 0.0 | |
|---|---|
| 2.0 | |
| 1.0 | |
| 3.0 | 358 |
| 4.0 | 252 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 4.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 9514 | |
| 2.0 | 3535 | 23.1% |
| 1.0 | 1667 | 10.9% |
| 3.0 | 358 | 2.3% |
| 4.0 | 252 | 1.6% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0.0 | 9514 | |
| 2.0 | 3535 | 23.1% |
| 1.0 | 1667 | 10.9% |
| 3.0 | 358 | 2.3% |
| 4.0 | 252 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.716886337 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 202 |
| Zeros (%) | 1.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9522556656 |
|---|---|
| Coefficient of variation (CV) | 0.2018822582 |
| Kurtosis | 11.41194581 |
| Mean | 4.716886337 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.495353968 |
| Sum | 72291 |
| Variance | 0.9067908526 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 5 | 13838 | |
| 2 | 529 | 3.5% |
| 4 | 306 | 2.0% |
| 1 | 267 | 1.7% |
| 0 | 202 | 1.3% |
| 3 | 184 | 1.2% |
| Value | Count | Frequency (%) |
| 0 | 202 | 1.3% |
| 1 | 267 | 1.7% |
| 2 | 529 | 3.5% |
| 3 | 184 | 1.2% |
| 4 | 306 | 2.0% |
| 5 | 13838 |
| Value | Count | Frequency (%) |
| 5 | 13838 | |
| 4 | 306 | 2.0% |
| 3 | 184 | 1.2% |
| 2 | 529 | 3.5% |
| 1 | 267 | 1.7% |
| 0 | 202 | 1.3% |
| Distinct | 22 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.92659533 |
| Minimum | 0 |
|---|---|
| Maximum | 21 |
| Zeros | 445 |
| Zeros (%) | 2.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 14 |
| Q3 | 19 |
| 95-th percentile | 21 |
| Maximum | 21 |
| Range | 21 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 6.609195333 |
|---|---|
| Coefficient of variation (CV) | 0.5112866277 |
| Kurtosis | -0.9501779516 |
| Mean | 12.92659533 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.5118464819 |
| Sum | 198113 |
| Variance | 43.68146295 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 21 | 2674 | |
| 15 | 1132 | 7.4% |
| 14 | 1118 | 7.3% |
| 13 | 1097 | 7.2% |
| 16 | 979 | 6.4% |
| 11 | 906 | 5.9% |
| 17 | 845 | 5.5% |
| 1 | 788 | 5.1% |
| 19 | 785 | 5.1% |
| 18 | 618 | 4.0% |
| Other values (12) | 4384 |
| Value | Count | Frequency (%) |
| 0 | 445 | |
| 1 | 788 | |
| 2 | 535 | |
| 3 | 395 | |
| 4 | 314 | 2.0% |
| 5 | 487 | |
| 6 | 546 | |
| 7 | 390 | |
| 8 | 272 | 1.8% |
| 9 | 230 | 1.5% |
| Value | Count | Frequency (%) |
| 21 | 2674 | |
| 20 | 409 | 2.7% |
| 19 | 785 | 5.1% |
| 18 | 618 | 4.0% |
| 17 | 845 | 5.5% |
| 16 | 979 | 6.4% |
| 15 | 1132 | |
| 14 | 1118 | |
| 13 | 1097 | |
| 12 | 128 | 0.8% |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.034712254 |
| Minimum | 0 |
|---|---|
| Maximum | 7 |
| Zeros | 1687 |
| Zeros (%) | 11.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.044106366 |
|---|---|
| Coefficient of variation (CV) | 0.6735750198 |
| Kurtosis | -0.7513707788 |
| Mean | 3.034712254 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.3067150917 |
| Sum | 46510 |
| Variance | 4.178370837 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 4 | 3394 | |
| 1 | 2910 | |
| 3 | 2574 | |
| 0 | 1687 | |
| 2 | 1634 | |
| 7 | 1349 | 8.8% |
| 5 | 1077 | 7.0% |
| 6 | 701 | 4.6% |
| Value | Count | Frequency (%) |
| 0 | 1687 | |
| 1 | 2910 | |
| 2 | 1634 | |
| 3 | 2574 | |
| 4 | 3394 | |
| 5 | 1077 | 7.0% |
| 6 | 701 | 4.6% |
| 7 | 1349 | 8.8% |
| Value | Count | Frequency (%) |
| 7 | 1349 | 8.8% |
| 6 | 701 | 4.6% |
| 5 | 1077 | 7.0% |
| 4 | 3394 | |
| 3 | 2574 | |
| 2 | 1634 | |
| 1 | 2910 | |
| 0 | 1687 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.321479838 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 519 |
| Zeros (%) | 3.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.376736782 |
|---|---|
| Coefficient of variation (CV) | 0.3185799386 |
| Kurtosis | 2.724603401 |
| Mean | 4.321479838 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -2.020618858 |
| Sum | 66231 |
| Variance | 1.895404166 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 5 | 11241 | |
| 4 | 1833 | 12.0% |
| 1 | 919 | 6.0% |
| 2 | 667 | 4.4% |
| 0 | 519 | 3.4% |
| 3 | 147 | 1.0% |
| Value | Count | Frequency (%) |
| 0 | 519 | 3.4% |
| 1 | 919 | 6.0% |
| 2 | 667 | 4.4% |
| 3 | 147 | 1.0% |
| 4 | 1833 | 12.0% |
| 5 | 11241 |
| Value | Count | Frequency (%) |
| 5 | 11241 | |
| 4 | 1833 | 12.0% |
| 3 | 147 | 1.0% |
| 2 | 667 | 4.4% |
| 1 | 919 | 6.0% |
| 0 | 519 | 3.4% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.800991779 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 6519 |
| Zeros (%) | 42.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.943962501 |
|---|---|
| Coefficient of variation (CV) | 1.079384439 |
| Kurtosis | -1.404300017 |
| Mean | 1.800991779 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.5147048644 |
| Sum | 27602 |
| Variance | 3.778990207 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 6519 | |
| 4 | 2681 | |
| 1 | 2382 | 15.5% |
| 5 | 2055 | 13.4% |
| 2 | 846 | 5.5% |
| 3 | 843 | 5.5% |
| Value | Count | Frequency (%) |
| 0 | 6519 | |
| 1 | 2382 | 15.5% |
| 2 | 846 | 5.5% |
| 3 | 843 | 5.5% |
| 4 | 2681 | |
| 5 | 2055 | 13.4% |
| Value | Count | Frequency (%) |
| 5 | 2055 | 13.4% |
| 4 | 2681 | |
| 3 | 843 | 5.5% |
| 2 | 846 | 5.5% |
| 1 | 2382 | 15.5% |
| 0 | 6519 |
training_hours
Real number (ℝ≥0)
| Distinct | 241 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 61.49504111 |
| Minimum | 0 |
|---|---|
| Maximum | 240 |
| Zeros | 4 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 119.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 22 |
| median | 46 |
| Q3 | 87 |
| 95-th percentile | 173 |
| Maximum | 240 |
| Range | 240 |
| Interquartile range (IQR) | 65 |
Descriptive statistics
| Standard deviation | 51.76235137 |
|---|---|
| Coefficient of variation (CV) | 0.8417321208 |
| Kurtosis | 0.96195172 |
| Mean | 61.49504111 |
| Median Absolute Deviation (MAD) | 29 |
| Skewness | 1.22567531 |
| Sum | 942473 |
| Variance | 2679.341019 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 27 | 266 | 1.7% |
| 17 | 241 | 1.6% |
| 11 | 228 | 1.5% |
| 49 | 224 | 1.5% |
| 21 | 220 | 1.4% |
| 19 | 218 | 1.4% |
| 33 | 217 | 1.4% |
| 20 | 217 | 1.4% |
| 5 | 216 | 1.4% |
| 23 | 215 | 1.4% |
| Other values (231) | 13064 |
| Value | Count | Frequency (%) |
| 0 | 4 | < 0.1% |
| 1 | 82 | 0.5% |
| 2 | 101 | |
| 3 | 187 | |
| 4 | 85 | 0.6% |
| 5 | 216 | |
| 6 | 161 | |
| 7 | 184 | |
| 8 | 184 | |
| 9 | 193 |
| Value | Count | Frequency (%) |
| 240 | 7 | |
| 239 | 11 | |
| 238 | 10 | |
| 237 | 9 | |
| 236 | 10 | |
| 235 | 10 | |
| 234 | 9 | |
| 233 | 11 | |
| 232 | 7 | |
| 231 | 8 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | city | city_development_index | gender | relevent_experience | enrolled_university | education_level | major_discipline | experience | company_size | company_type | last_new_job | training_hours | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17855 | 64 | 14 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 1.0 | 4.0 | 5.0 | 0.0 | 89 |
| 1 | 17664 | 5 | 85 | 1.0 | 1 | 2.0 | 4.0 | 5.0 | 15.0 | 5.0 | 5.0 | 5.0 | 14 |
| 2 | 13404 | 85 | 77 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 3.0 | 2.0 | 2.0 | 4.0 | 35 |
| 3 | 13366 | 5 | 85 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 15.0 | 1.0 | 1.0 | 0.0 | 52 |
| 4 | 15670 | 95 | 67 | 0.0 | 0 | 0.0 | 0.0 | 5.0 | 15.0 | 4.0 | 5.0 | 0.0 | 154 |
| 5 | 3857 | 48 | 82 | 1.0 | 0 | 2.0 | 2.0 | 5.0 | 17.0 | 3.0 | 5.0 | 4.0 | 49 |
| 6 | 17115 | 64 | 14 | 0.0 | 0 | 2.0 | 2.0 | 5.0 | 20.0 | 2.0 | 5.0 | 5.0 | 38 |
| 7 | 11516 | 101 | 40 | 1.0 | 0 | 0.0 | 2.0 | 5.0 | 16.0 | 3.0 | 4.0 | 5.0 | 13 |
| 8 | 9457 | 41 | 24 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 10.0 | 0.0 | 5.0 | 0.0 | 10 |
| 9 | 12825 | 64 | 14 | 1.0 | 0 | 2.0 | 2.0 | 5.0 | 14.0 | 0.0 | 5.0 | 0.0 | 175 |
Last rows
| df_index | city | city_development_index | gender | relevent_experience | enrolled_university | education_level | major_discipline | experience | company_size | company_type | last_new_job | training_hours | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15316 | 12878 | 48 | 82 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 19.0 | 1.0 | 4.0 | 0.0 | 57 |
| 15317 | 12914 | 30 | 78 | 1.0 | 0 | 2.0 | 2.0 | 5.0 | 21.0 | 4.0 | 5.0 | 0.0 | 84 |
| 15318 | 17111 | 5 | 85 | 1.0 | 0 | 1.0 | 0.0 | 5.0 | 21.0 | 6.0 | 5.0 | 4.0 | 46 |
| 15319 | 11576 | 95 | 67 | 0.0 | 0 | 1.0 | 0.0 | 5.0 | 1.0 | 3.0 | 5.0 | 0.0 | 186 |
| 15320 | 5365 | 104 | 27 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 1.0 | 2.0 | 5.0 | 3.0 | 65 |
| 15321 | 10398 | 95 | 67 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 4.0 | 4.0 | 5.0 | 0.0 | 92 |
| 15322 | 859 | 5 | 85 | 0.0 | 0 | 2.0 | 2.0 | 5.0 | 1.0 | 4.0 | 5.0 | 0.0 | 15 |
| 15323 | 10566 | 74 | 75 | 0.0 | 0 | 2.0 | 2.0 | 5.0 | 5.0 | 1.0 | 5.0 | 0.0 | 33 |
| 15324 | 3085 | 64 | 14 | 0.0 | 0 | 2.0 | 0.0 | 5.0 | 6.0 | 0.0 | 1.0 | 1.0 | 110 |
| 15325 | 3019 | 5 | 85 | 1.0 | 0 | 2.0 | 2.0 | 5.0 | 2.0 | 1.0 | 5.0 | 3.0 | 81 |